Genome analysis EPGA2: memory-efficient de novo assembler
نویسندگان
چکیده
Motivation: In genome assembly, as coverage of sequencing and genome size growing, most current softwares require a large memory for handling a great deal of sequence data. However, most researchers usually cannot meet the requirements of computing resources which prevent most current softwares from practical applications. Results: In this article, we present an update algorithm called EPGA2, which applies some new modules and can bring about improved assembly results in small memory. For reducing peak memory in genome assembly, EPGA2 adopts memory-efficient DSK to count K-mers and revised BCALM to construct De Bruijn Graph. Moreover, EPGA2 parallels the step of Contigs Merging and adds Errors Correction in its pipeline. Our experiments demonstrate that all these changes in EPGA2 are more useful for genome assembly. Availability and implementation: EPGA2 is publicly available for download at https://github.com/ bioinfomaticsCSU/EPGA2. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
منابع مشابه
Efficient de novo assembly of large genomes using compressed data structures.
De novo genome sequence assembly is important both to generate new sequence assemblies for previously uncharacterized genomes and to identify the genome sequence of individuals in a reference-unbiased way. We present memory efficient data structures and algorithms for assembly using the FM-index derived from the compressed Burrows-Wheeler transform, and a new assembler based on these called SGA...
متن کاملA Scalable and Accurate Targeted Gene Assembly Tool (SAT-Assembler) for Next-Generation Sequencing Data
Gene assembly, which recovers gene segments from short reads, is an important step in functional analysis of next-generation sequencing data. Lacking quality reference genomes, de novo assembly is commonly used for RNA-Seq data of non-model organisms and metagenomic data. However, heterogeneous sequence coverage caused by heterogeneous expression or species abundance, similarity between isoform...
متن کاملScalable De Novo Genome Assembly Using Pregel
De novo genome assembly is the process of stitching short DNA sequences to generate longer DNA sequences, without using any reference sequence for alignment. It enables highthroughput genome sequencing and thus accelerates the discovery of new genomes. In this paper, we present a toolkit, called PPA-assembler, for de novo genome assembly in a distributed setting. The operations in our toolkit p...
متن کاملTedna: a transposable element de novo assembler
MOTIVATION Recent technological advances are allowing many laboratories to sequence their research organisms. Available de novo assemblers leave repetitive portions of the genome poorly assembled. Some genomes contain high proportions of transposable elements, and transposable elements appear to be a major force behind diversity and adaptation. Few de novo assemblers for transposable elements e...
متن کاملA Comprehensive Study of De Novo Genome Assemblers: Current Challenges and Future Prospective
Background Current advancements in next-generation sequencing technology have made possible to sequence whole genome but assembling a large number of short sequence reads is still a big challenge. In this article, we present the comparative study of seven assemblers, namely, ABySS, Velvet, Edena, SGA, Ray, SSAKE, and Perga, using prokaryotic and eukaryotic paired-end as well as single-end data ...
متن کامل